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Abstract 

In structural brain networks the connections of interest consist of white-matter fibre bun- 
dles between spatially segregated brain regions. The presence, location and orientation of 
these white matter tracts can be derived using diffusion MRI in combination with probabilis- 
tic tractography. Unfortunately, as of yet no approaches have been suggested that provide 
an undisputed way of inferring brain networks from tractography. In this paper, we provide 
a computational framework which we refer to as Bayesian connectomics. Rather than apply- 
ing an arbitrary threshold to obtain a single network, we consider the posterior distribution 
of networks that are supported by the data, combined with an exponential random graph 
(ERGM) prior that captures a priori knowledge concerning the graph-theoretical properties 
of whole-brain networks. We show that, on simulated probabilistic tractography data, our 
approach is able to reconstruct whole-brain networks. In addition, our approach directly 
supports multi-model data fusion and group-level network inference. 



1 Introduction 



Human behaviour ultimately arises through the interactions between multiple brain regions that 
together form a network that can be characterized in terms of structural, functional and effective 
connectivity (jPenny et al.l , 120061 ). The science of characterizing neural connectivity, that is, to 



elucidate the wiring diagram of the human brain, has come to b e known as connectomics and is 



seen as one of the great challenges in neuroscience (|SDorns et al.l . l2005i) 



Structural connectivity presupposes the presence of white-matter tracts that connect spatially 
segregated brain regions which constrain the functional and effective connectivity between these 
regions. Hence, structural connectivity provides the scaffolding that is required to shape neu- 
ronal dynamics. Furthermore, structural connectivity has been related to several dise ases, such 



as Alzheime r's disease, attention deficit hyperactivity disorder and multiple sclerosis (|He et al 
[2OO8 . 200^ iKonrad and Eickho and is therefore also of major importance in clinical 

neuroscience (ICatanil . 12007^ 



Currently, the only way to estimate structural connectivity of whole-brain networks in vivo 
is through the use of diffusion imaging; a variant of magnetic resonance imaging which measures 
the restricted diffusion of water molecules, thereby providing an indirect measure of the presence 
and orientation of white-matter tracts. By following the principal diffusion direction in individual 
voxels, streamlines can be drawn that represent the structure of fibre bundles co nnecting sepa- 
rate r egions of grey mat ter. This process is known as deterministic tractography (jChung et al.l . 
2OIOI : IZhang et al.ll2005l ). There are however several problems associated with this deterministic 



approach as it cannot easily deal with various ambiguous but common fiber con figurations such 
as kissing, crossing and splaying fibre bundles ("jbabdi and Johansen- 



A s a so l ution to this problem, probabilist i c trac tography has been introduced (jBehrens et al 



2007Ll2003t iFriman et al.l [ |2006l:IJbabdi et alJ . l2007i) . By repeating the stream ining process multi 



pie times, probabilistic tractography ultimately produces for each voxel a measure of uncertainty 
about in which other voxel a hypothesized connection will terminate. A benefit of the proba- 
bilistic approach is that it can deal with ambiguous fiber configurations by taking streamlining 
uncertainty into account and produces probability estimates concerning the presence or absence 
of particular white-matter tracts. 

Often, one is not interested so much in particular tracts but rather in whole-brain struc- 
tural connectivity. Several appro aches have b een suggeste d to deriv e whole-brain networ ks from 



probabilistic tractography res ults (jChung et al., . ,2011.: Hag mann et al .. 2007; Iturria-Mcdin a et al 
20081 : ISkudlarski et al.L 12008 ). The resulting networks are all reported to exhibit certain graph- 
theoretical properties, s uch as a characteristic short path length, clu stering, the existence of hubs 
and modular structure (iBassett et al. . 2011 : Bassett and BuUmord 2006; Bullmore and Sporns . 
20091: iGong et al.l . |2009 [ Spornsl . l2006l ). Unfortunately, the inference of whole-brain networks from 
probabilistic tractography estimates remains somewhat ad hoc. Networks are often constructed 
by counting the number of strear nlines that coii i iect r egions of interest and interpreting this as an 



undirected, weighted graph (e.g. ( Zaleskv et al. . 2010t )). Alternatively, a binary graph is obtained 



by si mply adding an edge f or each pair of regions t hat have one o r more streamlines between them 
(e.g. (|Bassett et al.l . l201lt iHagmann erall . l2007t IVaessen et all . I2OIOI )). It is hard, however, to 
maintain a direct correspondence between connection str ength and the probability distribu tion 
over tracts that is produced by probabilistic tractography (jJbabdi and Johansen- Bereil201lh . 

Another important observation is that the mentioned approaches do not easily support the 
integration of probabilistic streamlining data with other datasets. This is a particularly relevant 
topic in contemporary neuroscience since it is assumed by multi-modal data fusion, where data 
acquired via different recording modalities is integrated to provide a coherent picture of brain 
function ( Horwitz and Poeppeil2Q0^ . and required by group-level inference, where the interest is 
in estimating a network that char acterizes a part i cular p opulation, e.g. when comparing patients 



with controls in a clinical setting ( Simpson et al. . r2011bfl 



In the following, we provide for the first time a computational framework for the estimation of 
whole-brain networks from probabilistic tractography outcomes. In our approach, which we refer 
to as Bayesian connectomics, we consider the posterior distribution of graphs that are supported 
by our data, instead of a generating a single network based on an arbitrary threshold. Our 
approach relies on defining a generative model for whole-brain networks which i s inspired by 
recent work on network inference in systems biology (Mukheriee and Speedl . I2OQ8I) and consists 
of two ingredients. First, a likelihood model based on a multivariate distribution which views 
the streamline distributions produced by probabilistic tractography as noisy data. Second, an 
exponential random graph prior which models prior knowledge concerning the graph-theoretical 
properties of whole-brain networks. The extension to multi-modal data fusion or group-level 
inference follows by defining the likelihood model as a product of multivariate distributions. In 
order to validate our methodology we make use of simulated data which allows us to compare the 
estimates produced by Bayesian connectomics with ground truth. We show that our approach 
is able to reconstruct whole-brain networks from simulated data as produced by probabilistic 
tractography. 



2 Bayesian inference of whole-brain networks 

In this section we derive our Bayesian approach to the inference of whole-brain networks, referred 
to as Bayesian connectomics. We start by defining the likelihood term for our generative model. 
Subsequently, an exponential random graph prior will be defined which can be used to capture the 
essential graph-theoretical properties of whole-brain networks. Finally, we derive a Markov chain 
Monte-Carlo algorithm to sample from the posterior distribution of whole-brain networks. 
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2.1 Likelihood model 



Assume we want to infer the connectivity between K brain regions. That is, we want to find a 
graph G € Q where Q is the family of undirected graphs. The graph G — {V, E) consists of vertices 
V — {Vi, . . . , Vk} and edges E CV xV where loops {Vi.Vi) for i e {1, . . . , if} are excluded from 
the edge set. We use = 1 when {Vi.Vj) e E and e^j = when [Vi, Vj) ^ E. 

We start by considering one region and the possible targets in which a postulated tract 
may terminate. Probabilistic tractography will produce a distribution over target regions — 
{riii , . . . , riifc)'^ by drawing S streamlines, Ni — X^fcLi l£ S oi them ending up in a target 
region. Let nu = and assume that all streamlines are drawn independently. The probability of 
finding a particular distribution is expressed as a multinomial distribution 



P (n, I X, 



K 

n 



(1) 



in which Xij with Xij — 1 represents the probability of drawing a streamline from region i to 
region j. This streamlining probability itself depends on whether or not there actually exists a 
tract between region i and region j. We model the distribution of streamlining probabilities using 
a Dirichlet distribution 

K 

P(x, |a,)ocn<'^"' (2) 



where the 



Cija + (1 - etj)a 



(3) 



can be interpreted as pseudo-counts that determine the probability of streamlining from region 
i to region j when an edge Cij in the underlying graph G is either present (a) or absent (a). 
Because the Dirichlet prior with parameters — {an, . . . ,aiK) is conjugate to the multinomial 
distribution, the posterior distribution of tract probabilities is again Dirichlet such that 



K 



P (xi I n,,aLi) (X P{ni \ Xi)P{xi | a^) oc ]^ : 



71 ij -\- Oi i 



(4) 



Let N = (ni;...;n/f) represent the probabilistic tractography data, X = (xi; . . . ; x/j-) the 
streamlining probabilities, and A = (ai; . . . ; slk) the parameters of the Dirichlet distribution, for 
all brain regions 1 < i < K combined. The likelihood of the graph G is then expressed as 



P (N I G) = y P (N I X) P (X I A) dX 

r(E,a^,) 



n 



n 



r(ai 



(5) 



where we have replaced A with G on the left-hand side to emphasize that we are interested in a 
probability distribution over a set of graphs. Note that A and G are easily exchanged through 
Eq. (131). The second step in Eq. ([S]) follows by recognizing P (N | G) as a product of compound 
Dirichlet-Multinomial distributions, also known as multivariate Polya distributions (|Minka . i2003h . 
This leads to the log-likelihood 



PslogP(N|G)-^ 



log 



log- 



^log^^"'^' 



(6) 
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2.2 Exponential random graph prior 

We would like to define a prior such that, when sampling from the prior, we generate reasonable 
whole-brain networks. Here, reasonable is taken to mean that the networks should respect graph- 
theoretical properties that have been inferred by previous studies in connectomics. One way to 
construct a prior that has this property is using the theory of exponential random gr aph models 
(ERGMs) ([Frank and Strausd . Il986l: iHolland and Lein hardt; 'l98l': 'Wasserman, 'l996|) . ER GMs, 
developed in the context of social network a nalysis (|Snii ders_et_ al., 2006.; Snijdersl, 120021) and 
systems biology (iMukheriee and Speed . 20081) . have recently been used to fit models of whole- 
brain networks ( Simpson et al.l . '2011allb[ ). However, they have not been used before as priors in a 
generative model of whole-brain networks. 

The exponential random graph prior is defined as a log-linear distribution 



P(G) cxexp (a ^u;,/,(G) 



(7) 



in which A is a strength parameter and the parameters Wi tune the relative strengths of the 
individual concordance functions fi'.Q^M. that capture several graph-theoretical properties of 
whole-brain networks. 

For illustrative purposes, we will use the prior in order to capture the small-world phenomenon 
that occurs so frequently in brain networks ( Spornsi 2010t ). That is, we wish to generate networks 
that have short average path length and a large clustering coefficient. This can be achieved by 
starting with a graph whose neighboring vertices are connected and rewiring edges at random with 
a certain probability ( Watts and StroeaM 1998 ). The choice of functions fi is restricted however 
by the Markov chain Monte Carlo sampling algorithm since, as we will note in Section 12.31 the 
change after each sampling step must be calculated efficiently. Unfortunately, this prevents us from 
using average path length as as a concordance function since the change in path length due to an 
edge mutation cannot be calculated locally. It requires an all-pairs shortest paths algorithm, for 
which the fastest - Dij kstra's algo rithm - is known to have time complexity 0(|i?|-|y|-|-|y|^logiy|) 
on sparse graphs (|Diikstral . ll959l ). Instead, as a surrogate model, we use a prior that prefers graphs 



which are characterized by a small number of short edges, 
between vertices i and j, then the prior is defined as 

P(G)«exp(/3/(G)) 



Let d;, be the Euclidian distance 



(8) 



with j3 = X-w and concordance function /(G) = — J2i j ^ijdij which assumes that the edge length 
follows an exponential distribution. In Section [31 we will show that this prior prefers graphs which 
exhibit the small-world property. 

2.3 Sampling from the posterior distribution 

By using the likelihood and the prior in conjunction with Bayes' rule, we obtain the following 
posterior distribution over whole-brain networks: 



P(G I N) ocP(N I G)P(G) 



(9) 



Since the posterior P {G \ N) cannot be calculated analytically, we use Metropolis Markov chai n 
Monte Carlo (MCMC) sampling to approximate this distribution ( Mukheriee and Soeedl . 2008). 
The acceptance of each sample G' in the sampling chain is determined by the ratio 



P [C I N) 
P(G I N) 



(10) 



A proposed graph becomes a new sample with probability min(l, a). The acceptance probability 
of a suggested sample can be calculated as 



log a = ALfe; + A^ti;,A/,(G) 



(11) 
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with ALki the change in log-hkelihood after flipping edge cm and Afi{G) — fi{G') — fi{G). This 
requires that we can efficiently update both the likelihood and the prior for new samples in the 
Markov chain. 

The change in log-likelihood as a consequence of the flipping of an edge Cki to 1 — e^j is defined 

as 

ALfci=logP(N|G')-logP(N|G) , (12) 

with the sole difference that the parameter aki associated with G' and denoted by iki is given by 
o-ki = 0,1k = (1 ~ eki)o- + BkiQi with Cki the edge as defined in G. Plugging 



AL 



kl 



loe 



log 



r {hki + ukiY 

r {au + nki)_ 



loe 



nkj. 



r {aik + nik) 
r {aik + nik) 



log 



into (112]) yields 



lo: 



r 



loe 



2 log 



r(a;fe) 



(13) 



The change in the log prior as a consequence of the flipping of an edge Cki to 1 — Cki depends 
on the employed concordance function(s). The concordance function used in ([5]) factorizes over 
individual edges: 

P(G) cxj^exp [-/3 ] (14) 



such that A/(G) = {2eki - 1)4; • 

All configurations of the 2''^(^~i)/^ possible graphs have a probability greater than zero to be 
constructed thus guaranteeing that the Markov chain is irreducible. The collection of accepted 
samples {G^^\ . . . , G'"^^} forms an approximation of the posterior P (G | N). When we desire to 
generate a whole-brain network, the maximum a posterior graph 



Gmap = argmaxjlogP (N | G) + logP (G)} 



(15) 



can be used. Alternatively, the samples can be used to estimate posterior probabilities of network 
features, such as the existence of a tract. Assuming the Markov chain has converged, the posterior 
probability of a single tract is given by 



(16) 



Other quantities may be estimated in a similar fashion. 



2.4 Dealing with multiple datasets 

As pointed out previously, probabilistic streamlining data need not be collected in isolation. It may 
be collected in conjunction with other data such as resting-state fMRI or MEG data or it may be 
collected for multiple subjects. We show that our approach quite naturally supports multi-modal 
data fusion and/or group- level inference. 

Suppose we have acquired M datasets Yi, . . . , Ym- The goal is to derive the graph G which 
best explains these data. The posterior for G will be of the form 



P(G I Yi,...,Ym) (xP(Yi,...,Ya, I G)P(G) 



(17) 



Conditional on G, we assume that the datasets are independent, such that we have the following 
factorized representation of the posterior: 



P(G| Yi,...,Ym)« 



M 



n^(Y, |G) 



P(G) 



(18) 
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In case of multi-modal data fusion this requires that we are able to write down a likelihood 
model for each of the datasets Y^. In case of group- level inference, represents probabilistic 
streamlining data for subject i. Note that this is not the same as simply summing the streamlines, 
by virtue of Eq. ([S]). It follows from the factorization implied by that, in order to run the 
MCMC algorithm at the group level, all that is required is to use the following term for the change 
in log-likelihood when computing the acceptance probability: 

M 

ALki = [log P (N, I G') - log P (N, I G)] . (19) 

1=1 



3 Simulation study 



In our simulation study, we used the Watts & Strogatz model ( Watts and Stroeatz . 19981) in 
order to generate a small-world graph G consisting of 20 edges and 20 vertices, laid out in a 
circular arrangement that defines the physical distance between the nodes. For this ground-truth 
network, we generated 'probabilistic streamlining' data as follows. For each node i a streamlining 
probability vector x was constructed by drawing from a Dirichlet distribution with parameters 
= (flii) ■ • ■ , OiAt)- Parameters were set to zero except for those aij for which G E (the true 
tracts) as well as for a randomly selected Uik for each 1 < i < 20 (representing a false positive 
due to kissing, crossing or splaying fiber bundles). These parameters were all set equal to 20. 
Subsequently, for each node i the streamlining probability vectors x^ were used to generate 1000 
streamlines drawn from a multinomial distribution. Figure [T] shows G together with the generated 
probabilistic streamlining data S. 




node 



Figure 1: Simulated data. A. Small-world graph G. B. Probabilistic streamlining data generated 
for graph G indicating the number of streamlines drawn between nodes i and j. 



Next, we wanted to determine whether the ground-truth network could be retrieved using 
Bayesian connectomics. That is, can we reconstruct G from probabilistic streamlining data S 
using our MCMC algorithm, does the exploitation of prior information help in retrieving the 
original network, and does it outperform a conventional thresholding approach? We ran the 
MCMC algorithm using five parallel chains, a burn-in of 2000 steps and an effective sample size 
of 8000. For the hyperparameters, we used a = 20 and a = 15 to indicate that absent edges 
may still have a substantial probability of being streamlined. For our prior, we used (5 — 2 as 
this generated networks whose number of edges agreed with what one expects if the underlying 
network is generated by the Watts & Strogatz model. 

Figure [2] shows the true connectivity as well as the reconstruction results for a conventional 
thresholding approach (containing those edges for which the number of streamlines exceeds zero) , 
the maximum likelihood solution, which is obtained by running the MCMC algorithm with a flat 
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true connectivity thresholding 





Figure 2: Comparison between the true connectivity and the connectivity obtained with a thresh- 
olding approach, a sample from the prior, the maximum likelihood (ML) solution and the maximum 
a posteriori (MAP) solution afforded by Bayesian connectomics. 



prior, and the maximum a posteriori solution, which is obtained by running the MCMC algorithm 
with an informed prior. The latter is the procedure we refer to as Bayesian connectomics. Clearly, 
both the thresholding approach and the maximum likelihood solution suffer from the false positives 
introduced during the streamlining process. The MAP solution, in contrast, is able to retrieve the 
true connectivity almost perfectly well. Note that this cannot be explained by the use of the prior 
alone since the prior assumes that the empty graph is the most probable graph and samples from 
the prior are not likely to recover the long-range connections. 



4 Discussion 

Diffusion imaging promises to reveal the the wiring diagram of the human brain in vivo as achieved 
by probabilistic tractography. So far, the generation of whole-brain networks from probabilistic 
tractography data has relied on heuristic thresholding methods. In this paper we propose Bayesian 
connectomics as a principled framework that may replace these thresholding methods. The frame- 
work allows for the exploitation of prior knowledge when deriving the true connectivity based 
on inherently noisy probabilistic streamlining data. An additional advantage of Bayesian connec- 
tomics is that it easily allows dealing with multiple datasets as is the case with multi-modal data 
fusion or group-level inference. We have shown that our approach outperforms thresholding in 
retrieving the true connectivity underlying simulated probabilistic streamlining data. 

In the simulation, we made a number of assumptions. First, we assumed that errors during 
probabilistic streamlining arise through the existence of a restricted number of false positives which 
arise due to the existence of a few kissing, crossing and splaying fibers. It would be interesting to 
examine how our approach behaves under other, possibly more realistic, assumptions about how 
probabilistic streamlining data is generated. Furthermore, we assumed that the hyperparameters 
a, a and /? were known. The hyperparameters of the Dirichlet distribution are unknown in practice, 
but these could be chosen based on the distribution of the number of streamlines that are generated 
during the streamlining process. The parameter /3, which determined the influence of the prior 
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was chosen such that samples from the prior corresponded to the small-world networks we were 
interested in. In order to construct priors which work well in practice, various other concordance 
functions can be empl oyed, whose parame t ers can be estimated o n existing data using maximum 
likelihood approaches ( Hunter et al. . 20081 Simpson et al. . 2011a[) . 

In this paper, we have laid the foundations for a Bayesian approach to connectivity analysis. 
What remains is to validate the approach not only on simulated data but also on empirical data. 
This not only requires the reconstruction of whole-brain networks from diffusion imaging data, but 
also an assessment of their validity on independent data. One may think here of determining how 
well reconst ructed networks correspond to ne tworks estimated from resting-state fMRI or from 
tracer data ( Jbabdi and Johansen-Ber5 201ll) . Eventually, we would like to use our approach to 
determine how differences between (groups of) people can be understood in terms of variations in 
their underlying wiring diagrams. 
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